Learning mixtures of structured distributions over discrete domains
نویسندگان
چکیده
Let C be a class of probability distributions over the discrete domain [n] = {1, . . . , n}. We show that if C satisfies a rather general condition – essentially, that each distribution in C can be well-approximated by a variable-width histogram with few bins – then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of k unknown distributions from C. We analyze several natural types of distributions over [n], including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being wellapproximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems as described below. More precisely, • Log-concave distributions: We learn any mixture of k log-concave distributions over [n] using k ·Õ(1/ε) samples (independent of n) and running in time Õ(k log(n)/ε) bit-operations (note that reading a single sample from [n] takes Θ(log n) bit operations). For the special case k = 1 we give an efficient algorithm using Õ(1/ε) samples; this generalizes the main result of [DDS12b] from the class of Poisson Binomial distributions to the much broader class of all log-concave distributions. Our upper bounds are not far from optimal since any algorithm for this learning problem requires Ω(k/ε) samples. ∗Supported by NSF award DMS-1106999, DOD ONR grant N000141110140 and NSF award CCF-1118083. †This work was done while the author was at UC Berkeley supported by a Simons Postdoctoral Fellowship. ‡Supported by NSF grants CCF-0915929 and CCF1115703. §Supported by NSF grant CCF-1149257. • Monotone hazard rate (MHR) distributions: We learn any mixture of k MHR distributions over [n] using O(k log(n/ε)/ε) samples and running in time Õ(k log(n)/ε4) bitoperations. Any algorithm for this learning problem must use Ω(k log(n)/ε) samples. • Unimodal distributions: We give an algorithm that learns any mixture of k unimodal distributions over [n] using O(k log(n)/ε) samples and running in time Õ(k log(n)/ε4) bitoperations. Any algorithm for this problem must use Ω(k log(n)/ε) samples.
منابع مشابه
Learning Mixtures of Discrete Product Distributions using Spectral Decompositions
We study the problem of learning a distribution from samples, when the underlying distribution is a mixture of product distributions over discrete domains. This problem is motivated by several practical applications such as crowdsourcing, recommendation systems, and learning Boolean functions. The existing solutions either heavily rely on the fact that the number of mixtures is finite or have s...
متن کاملFeature Selection Facilitates Learning Mixtures of Discrete Product Distributions
Feature selection can facilitate the learning of mixtures of discrete random variables as they arise, e.g. in crowdsourcing tasks. Intuitively, not all workers are equally reliable but, if the less reliable ones could be eliminated, then learning should be more robust. By analogy with Gaussian mixture models, we seek a low-order statistical approach, and here introduce an algorithm based on the...
متن کاملLearning Mixtures of Product Distributions via Higher Multilinear Moments
Learning mixtures of k binary product distributions is a central problem in computational learning theory, but one where there are wide gaps between the best known algorithms and lower bounds (even for restricted families of algorithms). We narrow many of these gaps by developing novel insights about how to reason about higher order multilinear moments. Our results include: (1) An nO(k 2) time ...
متن کاملLearning with Mixtures of Trees
One of the challenges of density estimation as it is used in machine learning is that usually the data are multivariate and often the dimensionality is large. Operating with joint distributions over multidimensional domains raises specific problems that are not encountered in the univariate case. Graphical models are representations of joint densities that are specifically tailored to address t...
متن کاملMarginal Likelihood Integrals for Mixtures of Independence Models
Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are mixtures of independent distributions, or, in geometric language, secant varieties of Segre-Vero...
متن کامل